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Historically, training f or^erf ormance appraisal has • 
focused on the same issues as instrument d^elopmehtr-the reduction 
of psychometric errors in ratings. Efforts were centered around 4 
teaching people to use rating scales properly, A review of the 
literature shows these programs met with mixed success. Wfyile a 
meta-analysis of these data are premature, several hypotheses may be 
drawn: (1) knowledge of the job in question is more important than 
rating skills; (2) observational skills are important in re$l-world 
'ratings; (3) the purpose and context of ratings/are as or more 
important to accuracy .than the training itself / and (4) accuracy 
should be the primary goal of training. Training for performance 
appraisal is far from universal; Most training efforts £n actual x use 
involve learning how-to use a particular form or system. One possible 
training method to improve accurate evaluations involves the use of 
multiple performance examples, such as videotape, to represent 
multiple levels of accomplishment. Little systematic knowledge exists 
about the mechanics of implementing a theoretically-based appraisal 
system.: It is necessary to understand hoy the appraisal system 
functions in the operation of the organization; Considerations of * 
equity, of the multidimensionality of job performance , qx the cost of 
more refilled observations/ may make more sophisticated measurement 
impossible to achieve. Reliable, vali^ measures that provide accurate 
determination of two or three levels ,of performance are an, advance 
over biased assessment of five, or six or more. (JAC) / - t "' 4 
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As some of you may know, much of my work over the last'feft years has 
been concerned with theoretical analyses of th^ performance appraisal 
process, .in terms of some fairly abstract principles based ~en theories of 



-cognition and s^ial perception. 4 More recently, (FQldmap^l 981 ) Ij have 
tried to "turn ^hese ideas into guidelines fo^ the development of appraisal 
instruments :and training procedures- *f or appraisers (who, f after all , 

Ireally the«meaguiHn§' instruments )', V 

Today I'm, going to try to deal with the practicalities of training for 
appraisalT^ 'd like^ to consider some factors relevant W the development of 
theoretically based instruments and the use of appropriate training. I will 
rfot consider employee feedback per se;* since the succpss of feedback depends 
first on accurate assessment; the latter is my concern here, I'd also like 
to suggest some areas of cdmmon concern for the laboratory oriented theore- 
tically mindecf researcher and the observation-oriented, organizationally 
minded researcher/practitioner, consideration of whi on might benef tjj^fcjoth. 
- Historically, training for pe^^rm^nce ! apprajsal has fbcusfced on the % 

same issues as f instrument development - the reduction 0/ "psyc hornet ic 

.. . ; • . ■ • -V - ;\ 

errors" in ratings, ^ Ihe* frequency and size of h^lo errors, leniency/- 

■ •» ^ 

stringency biases, contrast effects, and so forth were the dependent vari- 
ables of interest'. Efforts were centered around teaching people ' to -use 
rating scales "proptfHy", i.e. to avo^id halo, contrast, etc. These pVograms 



met with .mixed success. Brown (1968) used a training program emphasizing 

'. , - y ' ■ v * 

.practice with the rating -scale, a- discussion af rating errors, and an\ 

\ r ■. ' ' 

. emphasis On "trait differentiation", finding reduced halo (increased inter- 
.scale, variance) for trained' peer raters on .a-set of six trait scales. 
Borman (1375) found, that brief training in the recognition' and avoidance of 

• * * 1 * 

'hald error reduced- its magnitude in the rating of specially constructed 

•' •. \ • . ; r - . . • ' - .f. ■ 

videotaped, performance vignettes, though. reliability decreased as'wejl. In 
agreement with Brown (I9j58), Bernardin & Walter (197$ found that training . 
and familiarization, with behavioral expectation scales .reduced halo and 
lerriincy error in insl^rucjor ratings by students.' This training, apparently . 
improved the observational skills of' students, as well as focusing their 

attention on the "performance dimensions covered by their Behavioral Expecta- 

* . ■ < \ • r - 

tijon Sc^le, * 1 , . 

Bernardin (1978) found that a one-hour training session was more 
, t ..... 

effective than a five-minute session in reducing leniency and halo in 

student ratings; however, . the training effect - disappeared after a few 

months. Support for the proposition that longer,, more detailed training 

sessions (especially when ..used in' .conjunction with behavioral ly-anchored 

rating "'scales') are more effective also comes from Wexley, Sanders-, arid YukT 

(1973). Their effective interviewer training'session involved discussion of* 

a- job'^s 'requirements' and appl icant* qual if ications , a -detailed evaluation 

guide, videotaped examples of 'good, bad, and average performers together 

with-' rating feedback and discussign of psychometic error. They were sue- 

cessful in eliminating contrast errors in the rating of videotaped ' stimul i . 

A similar "six to eighV hour program designed by Latham, Wexley & Purcell 

(1975) 1 ikewise featured instrtretion in observational skills, discussion of 

'errors, feedback^and active participation in learning to eliminate errbrs. 



They found that, six months later, the workshop training eliminated con- 
trast, halo, and similarity, errors in comparison to„a control groups and 

recency effects compared to a "discussion ■■" traiqlrig 11 condition. Videotaped' 

\ ' . • , " L * • " " ■ ' • ' • ■.' • 

interviewees were ttje stimuli. Borman (1079) likewise found that an inten- 

sive workshop reduced halo in the ratings lof videotaped stimuli , but* did ^iot , 
improve accuracy; v ^ < 

IvancevicTT (1979) used an even mQre intensive (three-day) training 
procedure, compared to a three-day discussion group and a ^hp-treatment 
control. The intensive training involved videotaped performance examples « 
^pnd feedback to managers, ^ntensive graining was superior ; to the discussion 
and control conditions in- reducing Halo and leniency in actual administra- 
tive ratings six months after training, but the effect on halo was reduced 
a^ter 12 months. Warmke & Billings (1979) , likewise conducting research in 
an organizational setting, compared thexeffectiveness of shortened discusr 
sign training, patterned after Latham*, et al.N(/1975) with lecture, participa- 
tion in graphic rating scale construction, .and a control group. On expert- * 



mental ratings, participation in scale construction and .lefcture were 



perl- 
jnost 



effective in reducing psychometric errors, and the lecture .group produced 
marginally greater i nte rater, reliability • Interestingly,, on hplo effect 
measures, there was. a significantly greater degree of error when ratings* 
we?e m$de for administrative rather than research purposes. 

To further confuse' things** Bernardin Pence (1980) found that training 

■ + ' 1 / \ . • - 

'to recognize and *avofd rating , errors, which included examples, discussion, 

and feedback, was effective- /in* reducing halo and leniency compared to a 

. * 
control group and a second grouji trained in the dimensions of the job 1n 

question. However, the trained group was less accurate than either of tfie^ 

othtyj^roups in ; rating the hypothetical stimulus vignettes. 



'". | *'' ... . ' • '■ ' ^ii.-'i; 

.f These'.resul.ts correspond broadly to those - obtained by Borman (1975, 

• . ' • \ ' ' ■ ' 

1979 )\ who found no effects of either * short (1975) or long (1975)< fraining 

.!• •/ ' • . " • * ' ? , ? 

program on the accuracy with tfhich 'subjects rated writjten vignettes or • 
videotaped -stimuli. / Apparently, traming in $he avoidance of psychometic •* 
error changes • raYnig behalior (e.g. leads' to .greater be'tween-dimension 
rating variance); but, as Bernardin & PenceAi'980) concluded . pne new J 

• * ' " v. , > • • 

response sets may distart the representation ;of performance by incorrectly 

■ . J • ' 

/ lowering scores aud removing "true" halo (Cooper, 1981). 
") " Recognizing that a meta-analysis on these data is . premature, and 
/• realizing* the dangers of drawing conclusions from narrative (and brief) 
reviews, may any useful hypotheses be drawn from these studies? I ..think 

'.'yes- ] o ' ■ ■ >~a -3 ( • 

They are : , \ ' ' ,} ' . ' ' 

" ' j 1; Knowledge of the job iivquestion is as or npre 
1 important than rating *skil Is. Participation in ■ 



scale development probably teaches one about^^lie . v , *, . 

job. A useful program should teach appraisers what/ 



4 



behaviors to 'look fcfr as well asVhow to translate 
> - observation^into numbers ^ paper/ 

• 2. Observational skills are important in real /-world*, 
ratings, which must be made on the basis pf events 
occurring^ove^/ong periods' of time. r h' 
3.. The purpose £nd J context of ratings artf^s or/rnoreS 
;> , /important to accuracy than the train j^t) itself, > 

, h. ' 4.1 As stated by Latham B Wesley (1981 ) 'and- Cooper * . 

v" (1981), "psychcynetjc, errors" in actual aj*ituiijst)a>ive 

*• * - * t ratings may not be, errors at all , but may, reflect 1 

v • •' " ' ► f ' %/'V- ! ' v 



(' 
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re^\ intercorrelation among job dimensions (halo), 

• • ■ \ : ;<": ■ • , ' ■ • • ... ... • 

or high, or low levels of work-group performance 
, ■/ • (lenient-stringency). It follows that accuracy 

/. " should be the primary goal of training. 

That Observational accuracy can be taught is supported by Thornton & 
Zoridi (1980) as well as Bemardin & Walter (1977). The importance of 
rating purpose is underscored by Zedeck & Cascio (1982), who found that 
training had no effect on the evaluation of hypothetical supermarket employ-' 
• ees, w but purpose *of the.evaluation (merit raise vs. employee development or 
retention) did. "In particular, the "raise 11 . decision resulted in less 
differentiation among the hypothetical employees. . •' # 

Finally, the idea that observational accuracy is related to-accuracy in 
apprd^la'l is supported by ^furphy, Garcia, Kerkar, Martin, and Balzer p>982), 
in tiheir study of the evaluation of videotaped lecture performance. While 
their study did not focus on training effects, it seems reaso^aW e -that, if 



individual differences' in .observational accuracy a re related tQ differences 
in appraisal accuracy, then training to improve .the farmer should also 
improve the latter. 



The program most Clearly * related to the hypotheses above ■ is that ;of 
Latham & Wexley - (198T)l. This" involves an .intensive workshop focussing on a 
Succession of cpsychometVic errors (e.g., similarity/halo) with videotaped 
stimuli and behavioral feedback* The program also trains observational 
skills. They, teport thait this type of program is successful in ^actual 
practice, -in. .th^t: the .training applied to a group of supervisors improved 

crUerfo'W reliability' and] validity sufficiently so ' that a \previftusly 

* - ...... . :•: ..<•> . > \ . ; . .,<; 

"irvVal i d M ^selection -battery successfully predicted the : "new 11 criterion. 

scbresi. Unfortunately it is not known just what aspects Of the training 



contributed to the improvement. Given the data presented above, though, I 
would hypothesize that the observational skills component was largely 
responsible for the increase in validity. A stpdy comparing training in ^the 
several., components of. the program with a group taking the, complete training 
course (and including no-treatment and placebo groups) would properly test 
this hypothesis. . „ 

Their program does not deal with training in the job itself. 
Apparently, they'assume that supervisors are sufficiently knowledgeable and 
competent to render this step superfluous. They seem to rely on knowledge 
of the principles of observation, judgement, and rating to transfer to the 
job in question. , 

In conclusion, I think the .evidence is consistent with the contention 
that training In the avoidance of psychometric error, in and of itself, ,1s 
not, helpful. The people who become more accurate appraisers most probably 
learned what kinds of behaviors to observe, and how to observe and recall 
them in the context of a valid conception, or schema, of the job'. This 
conclusion agrees with Borman's (1979) regarding the goals of training. 
Training in Practice 

♦Training for performance appraisal is far from universal, whether in 
the private or the public sector. Estimates of its frequency range from 75% 

to less than 251 (De Vries, Morrison, Shullman & Gerlach, 1981). In an 

/.<••••.'•' 

attempt to .discover" the nature and content of current training efforts, I 

. » * * 

searched several' practitioner's journals from the most recent issue through 
Sj978. The results were disappointing, to say the least. 
\-/ While articles about performance appraisal are not infrequent, very few 
mention, training at all. In one recent .survey of appratsal practices (Teel, 
1980), no mention was madooof training for appraisers., Some do mention the 



need for training (e.g., Wells, to assure consistent application, of 

the appraisal system. Orie author (Beau)ieu, 1980) recommends at least 40 
hours o^trainiog for appra.isal, in /addition to appraisal monitoring and 

followup systems. But neither specifies exactly what the training is 

■ , • \ • ***... 

supposed to includQ, nor how it\is to be accomplished. 

. / 

Only one paper (Robinsop & ^Robinson, 1978) dealt with the nature of 
training. The authors discussedX Performax , a modelling-based program 
designed to teach managers how to Conduct goal-^^e^ting , performance feed- 
back, and appraisal. Managers are tayght to establish specific goals and 
standards and provide daily feedback. Tt?ey are apparently not taught how to 
establish standards or assess one's level \of success in meeting them. These 
vital skills go largely unmentloncd in the Vacticioner-or ( ient<^ literature. 
They may be learned via modelling of one's ov^n supervisor, implying a larcjl? 
chance component in skill development. Ski ll\training may also be part of 
some consultants 1 programs; if so, the practice \eems far from universal. 

One large-scale performance appraisal program (Gomez-Mcjla , Page, & 
Tomow, Notes 1,2) includes a more ambitious training* program. The training 
their system provides includes computer-based -Instruction in the requisite 
company policies and the use of the appraisal system, but also Includes a 
12-hour workshop on the appraisal process Itself. This workshop seems very 
similar to the training described by Latham, at n) . (1975) though the actual 
training content is not discussed. Their extensive workshop and computer- 
based training do allow the possibility of training and feedback' 1n observa- 
tion 1 , encoding, and judgment skills. 

The most defensible conclusion can draw from this^ effort is that most 
training efforts in actual use Involve learning how to use a particular form 
or system (e.g., Haynes, 1978). In contrast, the training programs 



recommended in the applied academic literature focus on the elimination of 
psychometric error in ratings and the development of observational skills. 
The fact that most appraisal systems in use involve some form of training 
indicates that a recognized need for training exists. It is the respofisibi-, 
Tity of the academic researcher to develop more useful forms of training, 
and to demonstrate that usefulness in ways that lead to t+ie adoption of our 
best programs. What follows is an outline of the form I believe training 
should take. • 
Training Based on Theory 

So far, we have seen two forms of training; the first derived from 
empirical work with minimal theoretical background, the .second from the 
popularization of that work as well as the earlier "form-centered" research 
on appraisal. What Improvements can the newer theory-centered approaches 
promise the practitioner? 

One thing that should be remembered is that, any new approach is going 

to contain elements of previous practice. Just as a 1 4th century archer did 

not need Newtonian mechanics to hit the target, so good empirical research 

% ■ • 1 

and practice nuiy be volld without extensive theoretical underpinnings. The 

•>role of theory here Is to explain what* Is observed and to Improve practice 

by pointing out relationships not previously considered. But- there are a 

lot of steps between theory and technology, 

First of all, we must .differentiate between observation, cnemllng 

storage, recall, and evaluation or rating. Accurate evaluations depend 

first of all on ttie observer attending to .Important and relevant behav1ors\ 

then encoding or categorizing those appropriately, and recalling them wh<*n 

needed. In my system, this .depends very much on the appraiser's 

cognitive structure or category system as well as transient factors Influ- 

10 



encing his or her available categories. It follows from this that part of 

the success of previous training programs is <}ue to this focus on the 

behavioral definition of explicit job dimensions, either through scale 

i 

development, lecture, or discussion. These become part of the M job schema" 
or category/prototype system used in appraisal* To the extent trainees 
learn to recognize relevant behaviors automatically, an important cotnponent 
of accuracy is added to the appraisal process. 

As one might expect, those who are more Experienced and better at the 
job are more valid raters (Landy & Farr, 1980). An interesting study by 
Levy (1960; reported in Campbell, Dunnette, Lawler, & Weick, 1970) showed 
that high-performing accounting supervisor's evaluations of subordinates 
correlated with subordinate Intel Hgynce, while poorer-performing supervi- 
sor's ratings correlated with clerical aptitude. This at least makes 
plausible the notion that one's "implicit theory' 1 (schema) of the job 
influences one's ratings, and that training should "cover importaht aspects 
of the subordinate's job itself. ' 

Second, we must teach the translatfon of events Into judgments. What 
actions ore regarded as good or poor, and how good or poor? We have* seen 
that contrast effects may bias such judgments, and that training In scale 
use may al leviato * them. This fs the point at which Instrumcnt-ceittered 
training and feedback and anchoring stimuli (as in behavioral expectation 
scales) are most useful. At this point, raters should be taught to avoid 
bias caused by job-Irrelevant categorization (e.g., race, sex) or overall 
evaluative impressions as to rcduco j l l usor y halo (e.g., Nathari,& lord, 
1983). 

One possible . * training , method would Involve the use of multiple 
performance examples (e.g. videotaped performances , of products of 



10, 



performance) sampled so as to represent multiple levels of accomplishment on 
each of the schema-§iven dimensions of job behavior. The trainee could 
evaluate relevant performance dimensions at a computer console and get 
immediate feedback as to the fit of his or her judgements to an "ideal" 
evaluation model. After initial training, interpolated task activity could 
be introduced between observation and rating, so that both short and 
long-term memory for (and encoding of) behaviors could be assessed. Such 
training could include examples of job performance by people differing on 
job irrelevant dimensions (e.g. age, race, sex) so that potential biases 
xould be "trained out" of the rating response. The method is similar to the 
procedures used in concept attainment studies. It may be modified as 
appropriate for different types of tasks, as discussed below. 

A third point, taken from the cognitive perspective, ' is that different 
evaluation Instruments and different types of training arc appropriate for 
evaluating the performance of different types of jobs* This echos De Vries, 
et al. (1981), though it was developed independently (Feldman, 1981)* My 
thesis in that earlier paper is taken from Hammond (1981): there !$ a 
continuum of cognitive task*, anchored at one extreme by the "analytic" and 
at the tjther by the 'Mntul tivc". The midpoint Is represented by the "quasi- 

rational" task. An analytic task Is represented by a mathematics problem, 

i - ' ' 

or mechanical assembly; there is «m unequivocal rit.ind.ird for judging its 
porf urmance , and the process of its performance Is accessible to conscious- 
ness. The intuitive task is exemplified by the building of scientific 
theory, .or creativity in the arts. The process of solution Is not entirely 
accessible to consciousness, and there are muUiple standards of evaluation 
that can be applied* the quasi-rational tasks contains elements of both - 



12 



the job jof architect, for example, has Analytic elements (strycturalt sped- 
fications) and intuitive etfime^ (artistic merit). # . ; 4 y 

' Awet^ ]eveY jobs are often analytic, 'and may be defined by behavioralHy- 
-.<• anchored 1 seal eS and" objective criteria. I^g^evaluator must be trained to 
. be an observer a'nd^ recorder; presumably* the value o,f each performance^ 
• dimension may. be discovered by cost accsurTting or some validated estimation 'V;' 
p>oc v edure (e.gi Bobko, Karren, & Parkington, 19^3), so that the appraiser 
does not haye to .scale the behaviors. / . - - " 

Upper-level jobs are often quasi-r'ational or intuitive in .nature. The 

intuitive component requires the appraiser to either choose or develop an 

, ■ ■ . .. ■ • \ C 

appropriate task schema and then us£ it to evaluate Rerformance. This, in _ 

my opinion, is the theoretical basis for any usefulness in MBO and similar 
procedures , as recommended . by De Vries, et al.' (198T). The evaluator must ' , 
be trained not only in observation and evaluation', bu^ .in the multiplicity 
^of possible approaches to the job in question. In this^ase, job experience 
or reputation does not guarantee adequate evaluation; "scholarship" is 
needed. The criti'c may not be able to act or paint, but must know a lot 
about acting or painting. 

Finallyj.for the quasi-rational task, both kinds of skills/are needed, 
as appropriate to the task dimension. That is, analytic task dimensions 
must be evaluated using one kind of form, with a particular type of train- 
ing. Intuitive task dimensions require different forms , and different 
training. Finally/ the. two types of evaluations must be weighted and 
combined' to produce an overall judgement appropriate to the decision in 
question. * % 

It is appropriate to 'note here that cognitive/developmental psycholo- 
gists haVte recently . begun to stress tfte importance of previous knowledge to . 



new learning (Siegler, 1983) : Children (and; "J am willing to bet, adults as. 
wdll) are«said to learn' by experiencing exceptions to previously held rules! 

» ... <../•" ' ■ . . ' 

for encoding. and" inference, whereupon hew rules are adopted and tested. 

' j '. ■ ' " ' f c 

It is well-known that experts use different categbries . and rules than 

' "•' . ' . • .' 1 : . ■ - ' 

novices for encoding^and inference. a It 'follows that one important function 

• . ." • ■ .r . • . ' -« . . 

of applied research must be. the discovery of the categories and inference 

rules that are -presently used, .in order to aid training in new 5 category 

/ L •••• • ■'• * * - ' • . 

systems and inference rules where necessary. The training method diseus-sed 

earlier can accomplish this. v 

What evidence is there for the usefulness of. this approach? -Frankly, 



none. Borman's 



diffei&r 



(1979) finding that diffeifent. rating formats were more, 
accurate for different jobs, though limited, is at least consistent with 



Jthese ideas. It 

•> ~ 

similar ideas, 
notions. 



is also encouraging Ah^t others have come up with-somewhat 



<• _ 



I do however, have sbme ideas* about how to test these 



The basic strategy is 'one of construct validation. Job dimensions 

should be analyseable in terms of well -validated ability and/or personality 
/ \ ■ -V 

constructs. If an appraisal procedure is in fact more accurate or less 

biased, performance as measured by that procedure ought to correlate with 

measures of -the relevant abilities or dispositions and not with others. 

These correlations should be higher than those obtained with others equally- 

reliable, appraisal procedures. Furthermore, to the extent that performance 

as measured in Job. 1 depends on dimensions also common to Job 2, appraisals 

on Job 1 should be valid predictors of Job 2 performance more valid than 

other predictors. Appropriate evaluator training" and appropriate evaluation 

instruments ought to improve real-world predictability; inappropriate 

training should reduce the obtained correlations. We should also find that 



experienced, cotopetent incumbents rate subordinates and peers as expected on 
the ,bfe|s,: of independently-derived job schemata, and longitudinal studies 
should , show the ^ development ' of these schemata over time. For intuitive 
j&sks, the "ability to/generate and .use multiple, schemata should exist in 

•experts, regardless of their preference for one p^fticular schemVor 

- another. 

Contextual Moderators of Training Effectiveness 

So -far, I haye been dealing with' training fn a v vacuum, 'as if the 

.' !' . ' 9 ', ■ I V - " : ' 

evaluator wa ; s free to give any rating he or she desired, and as. if accurate 

• / • ^ 
appraisal was the' only goal of the appraisal system. Neither assumptf^i is 

• ■ . * \/ 

generally true. / 

As Ilgen and I discussed ,in our 1983 paper, performance appraisal is an 
.integral \art of organizational functioning. Training people to use a 



system that does not fit the realities of their organization is at best a 
waste of time for all concerned. ^Consider the military and civil-service 
performance appraisal systems. In the military, an elaborate set of forms 
and procedures are used for '^form's sake", but are. essentially meaningless. 
"Real" evaluations are communicated by a series of key words, known through 
experience and word-of-mouth. In the civilservice, the system is so formal 
and legalistic as to prevent meaningful . personnel actions, and any attempt 
to change the appraisal system requires revamping the entire structure. 

The private sector would seem to offer more flexibility, but even* here 
'the requirements of accurate appraisal are often subordinate to individual 
and group agendas. Organizational politics ' may require promotions for 
certain subordinates, regardless of their relative qualifications. "Keeping 
the peace" in a work group may require equal raises for all, again regard- 
less of performance differences. "Merit-based" pay may require high 



^valuations for "a 1.1 whep salg'sy budget? are bountiful , and -low evaluations 
\vhen*they are lean. Such- factors *hre ■ far more" powerful than the ideal of. 
^pprai sal accuracy, and will* exist: netjardl ess \of training. It; is therefore' 
necessaty' to* influence the .^etitire., structure and reward system of, the 
organization If we expec^^ 



function properly. V _ ■ y ■ - 



. ■ • Benefits and Cqsts of Appraisal Systems .? *■ 

\ . and Training . 

It should be possible to justify the necessary organizational changes 
on a purely economic basis. Data on the increased predictabil ity of job 
performance resulting from improved appraisals can be used to project 
economic benefit to 'the firm, as done by Schmidt, Hunter, ftcKenzie .& Muldrow 
for selection devices. Likewise, improvements in job satisfaction 
hive consequences for turnover and 'othef costly behaviors, and a better 
appraisal system may contribute importantly to satisfaction.- Accurate 
appraisals also al low ^ay^to be used in a maximally motivating manner (e.g., 
Lawler, 1981^ improving- both morale and productivity. We do not have 
accurate estimates of the financial outcomes of such interventions,- but. 
their estimation -is certainly feasible. 

Other potential benefits derive^ from the current legal environment: 
Recent'court decisions, have ; establ ished that performance criteria must^]^ 
standardized, objective, and job-related, and . based on a • formal job. 
analysis. Appraisers should be trained in the system, which itself should 
pertain to well-defined standards of behavior or performance. In addition, 
performance -criteria used for promotion decisions must meet the same 
standards as other selection devices. A program of instrument development 
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ajid training based; on cognitive* theory-, if Supported by both laboratory arjd 

.... ' • N ' 

field-study results , m^^ts thpse standards.^ ' 

% — Pi7T5lly, ,and perhaps most impoftantty, employees can benefit**-' from 
reductions in-.role ambiguity, from clear standards for reswarcl and advance- 
.ment, from- recognition of the truly outstanding performers, and from a 
system which admits less personalistic and group-centered bias.'* ^ 
Costs of such a system are perhaps more jdifficult to estima^l Es%i - 
' mating direct costs, of development, , of course, is not a* great problem- 
consulting fees, ^fen-hours, . computer time, training time, and so forth can 
■ • 

be handled' easily. Other costs-time lost due to the change of established 
power relationships , anxiety, initial dissatisfaction, etc. will be extraor- 
dinarily difficult to quantify. - 

Perhaps we may estimate these.by looking at similar large-scale organi- 
zational changes-job enrichment, for example - aS\a way of setting upper and 
* lower bounds on costs • Early, small-scale implementation of such programs* 
(e.g. i in a few plants of a large corporation) using quasi-experimental 
techniques may also help cost/benefit estimates. If, as De Vries, et al. 
(1981) and Teel (1980) state, appraisal systems undergo frequent revisions, 
the incremental costs of an innovative system can be more easily justified. 

Problems of Implementation * 
In general, little systematic knowledge exists about the mechanics of 
implementing a theoretically-based appraisal system. We know that'acqeptabi- 
lity of any new system \% important in practice, and that a system that 
presupposes a fundamental change" in organizational relationships (at least 
sometimes) is likely to be* unacceptable to some. It is also likely to .be 
resisted and sabotaged^ regardless of high level support. 
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In order to understand how to introduce^ new appraisal system *and make 
i\ effeqtive, we moSt^first understand' how tjue appraisal system function? in , 
the operation* of -the organization* .This ~l,s a task, for the observation-, 
oriented researcher. Systematic, "quantitative observational data are . 

* v k • :. 

needed, not anecdotes or case t studies; these data should focus .on the 

." - • * ■ • 

characteristics of formal and informal appraisal, systems . across organiza- 
tions/of different types, of different- degrees of success, in different 
cultures, under different ^economic constraints. I take' as a fundamental 
assumption that systems, both formal and informal, evolve to serve some 
purpose. We need to discover the systems that exist, and their purposes. 
How, for example, is. employee performance represented in Japan, in both 
traditional organizations and wore Western ones? How does this system t 
differ from that in other Oriental locales (e.g., Taiwan, Hong Kong)? How 
does the system differ by industry? By the degree of "industrial democracy 11 
as found in many European nations? The more we know about the "kinds of 

systems that exist, their precursors and their ramifications, the better we 

/ 

can plan for changes in our own system. * 

At the individual level, we should investigate the nature of category 
systems and inference rules that actual ly . exist. We . may, for example* 
discover that similar kinds . of schemata and rules are commonly used in 
organizations with more valid appraisal systems, or that expert appraisers 
use similar rules regardless of organization. This may be one more 
ramification of the generality of cognitive skill. 

Finally, we should face the possibility that we may not be able to 
refine our measures of job performance past the point of identifying two or 
three levels of contribution. Considerations of equity, of the mul ^dimen- 
sionality of job performance, of the cost of more refined observations arid 
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,'so forth may make more sophisticated measurement Impossible to achieve. <.;If 
so, we can s$1H make sure that the .measures »ye use 'are reliable and valid 
•as the state of the art will aWow, and take comfort 1n the fact that 
accurate^ determination of two or tyree levels of ^performance 1s an advance 
over the unreliable Qr biased assessment of, five, six, or pipre. 
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